SynthAssess Report

Original Data Sample
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
44 4 116391 9 13 4 9 1 4 1 0 0 40 38 0
21 4 319163 15 10 4 13 3 2 1 0 0 40 38 0
25 4 232914 9 13 2 1 5 2 0 0 0 38 38 0
21 4 174907 11 9 4 1 3 4 0 0 0 40 38 0
65 4 330144 15 10 2 3 0 4 1 0 0 40 38 0
Synthetic Data Sample
age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
24.1 4.0 309009.2 15.0 10.8 4.0 7.0 4.0 2.0 0.0 50.1 0.4 11.5 38.0 0.0
33.7 4.0 158875.9 15.0 15.9 2.0 3.0 0.0 4.0 1.0 6619.5 0.0 51.1 38.0 1.0
39.5 4.0 164406.6 15.0 9.0 2.0 0.0 0.0 4.0 1.0 4992.9 0.2 47.4 38.0 1.0
57.1 0.0 130439.5 15.0 13.0 2.0 12.0 0.0 4.0 0.0 64.2 2.9 35.7 38.0 0.0
23.3 4.0 208924.1 15.0 10.0 4.0 2.0 3.0 4.0 1.0 0.0 0.0 38.8 38.0 0.0
Range Coverage
Column Range Coverage (%)
age 100.0
fnlwgt 100.0
education 100.0
education-num 100.0
marital-status 100.0
occupation 100.0
relationship 100.0
race 100.0
sex 100.0
capital-gain 100.0
capital-loss 100.0
hours-per-week 100.0
income 100.0
native-country 97.5
workclass 87.5
Mean Range Coverage 99.0
Descriptive Statistics for Original Data
index age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
count 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00
mean 38.65 3.86 189037.44 10.35 10.07 2.58 5.72 1.43 3.66 0.67 1110.60 84.77 40.29 35.90 0.24
std 13.60 1.48 101210.51 3.82 2.57 1.49 3.99 1.60 0.86 0.47 7774.42 397.07 12.34 7.39 0.43
min 17.00 0.00 13769.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
25% 28.00 4.00 119429.50 9.00 9.00 2.00 2.00 0.00 4.00 0.00 0.00 0.00 40.00 38.00 0.00
50% 37.00 4.00 178513.50 11.00 10.00 2.00 6.00 1.00 4.00 1.00 0.00 0.00 40.00 38.00 0.00
75% 47.00 4.00 239548.25 12.00 12.00 4.00 9.00 3.00 4.00 1.00 0.00 0.00 45.00 38.00 0.00
max 90.00 8.00 816750.00 15.00 16.00 6.00 13.00 5.00 4.00 1.00 99999.00 3004.00 99.00 40.00 1.00
Descriptive Statistics for Synthetic Data
index age workclass fnlwgt education education-num marital-status occupation relationship race sex capital-gain capital-loss hours-per-week native-country income
count 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00 4000.00
mean 38.34 3.64 186132.74 14.72 9.93 2.37 4.50 1.29 3.68 0.70 873.66 72.15 40.51 35.71 0.23
std 11.48 1.55 76989.75 1.70 2.32 1.52 4.22 1.57 0.87 0.46 5952.85 346.02 8.91 8.56 0.42
min 17.00 0.00 13769.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00
25% 29.30 4.00 135897.85 15.00 9.00 2.00 1.00 0.00 4.00 0.00 0.00 0.00 37.90 38.00 0.00
50% 37.60 4.00 177197.20 15.00 10.00 2.00 3.00 1.00 4.00 1.00 84.50 0.00 40.50 38.00 0.00
75% 46.00 4.00 225299.72 15.00 11.80 4.00 9.00 3.00 4.00 1.00 304.45 0.60 43.72 38.00 0.00
max 90.00 7.00 816750.00 15.00 16.00 6.00 13.00 5.00 4.00 1.00 99999.00 3004.00 99.00 39.00 1.00
Comparison of Descriptive Statistics
Bivariate Correlation Matrix
Scatter Plot Comparison
Descriptive Statistics as Radar Chart

Average k-NN Distance for Original Samples: {'count': 40000.0, 'mean': 1098.1654830127793, 'std': 4644.89962537683, 'min': 13.797767368835443, '25%': 134.61503432135922, '50%': 244.73638495974663, '75%': 522.78751670575, 'max': 139440.33422961188}

Average k-NN Distance for Synthetic Samples: {'count': 40000.0, 'mean': 1040.3945121280071, 'std': 5814.03574832946, 'min': 24.01601792089142, '25%': 178.83793994840187, '50%': 295.42783479444137, '75%': 626.2960165255787, 'max': 275155.0748523069}

Average Neighbours for Original Samples: {'count': 40000.0, 'mean': 0.5, 'std': 0.5000062501171899, 'min': 0.0, '25%': 0.0, '50%': 0.5, '75%': 1.0, 'max': 1.0}

k-NN Distance Benchmark
NNeighbours for Original Sample

Privacy Matrix Difference: 59.64914051276453

Privacy Matrix

the main privacy attack, in which the attacker uses the synthetic data to guess information on records in the original data.

the baseline attack, which models a naive attacker who ignores the synthetic data and guess randomly.

the control privacy attack, in which the attacker uses the synthetic data to guess information on records in the control dataset.

Singling Out Results

Overall Singling Out PrivacyRisk(value=0.1064862456607837, ci=(0.0, 0.23589247180835815))

Main: SuccessRate(value=0.18425818370534194, error=0.10088397692500792)

Baseline: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Control: SuccessRate(value=0.08704056055866004, error=0.0688098675681735)

Linkability Results

Overall Linkage PrivacyRisk(value=0.0, ci=(0.0, 0.04678075567856615))

Main: SuccessRate(value=0.035673799566679355, error=0.03567379956667936)

Baseline: SuccessRate(value=0.05424684758401218, error=0.05070758831236596)

Control: SuccessRate(value=0.05424684758401218, error=0.05070758831236596)

Inference Results

Inference Attack
Original Data Regression Report
Metric Value 95% CI Lower Bound 95% CI Upper Bound
Mean Squared Error 110.03 98.73 122.25
Mean Absolute Error 8.05 7.66 8.51
Mean Absolute Percentage Error 21.59 20.45 22.76
Synthetic Data Regression Report
Metric Value 95% CI Lower Bound 95% CI Upper Bound
Mean Squared Error 121.97 108.45 137.40
Mean Absolute Error 8.38 7.96 8.82
Mean Absolute Percentage Error 21.67 20.58 22.64
Residual Plot
Data Discriminator Original X Synthetic
index precision recall f1-score support
0 0.99 0.99 0.99 804.00
1 0.99 0.99 0.99 796.00
accuracy 0.99 0.99 0.99 0.99
macro avg 0.99 0.99 0.99 1600.00
weighted avg 0.99 0.99 0.99 1600.00
Feature Importance Original X Synthetic
Data Discriminator Original X Holdout
index precision recall f1-score support
0 0.73 0.64 0.68 199.0
1 0.68 0.77 0.72 201.0
accuracy 0.70 0.70 0.70 0.7
macro avg 0.71 0.70 0.70 400.0
weighted avg 0.71 0.70 0.70 400.0
Feature Importance Original X Holdout
Data Discriminator Synthetic X Holdout
index precision recall f1-score support
0 0.98 0.97 0.98 199.00
1 0.98 0.99 0.98 201.00
accuracy 0.98 0.98 0.98 0.98
macro avg 0.98 0.98 0.98 400.00
weighted avg 0.98 0.98 0.98 400.00
Feature Importance Synthetic X Holdout